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|57] ABSII^VCT 

A microproces.sor for efficient proces.sing of instructions in 
a program How including a conditional program How control 
in.struction, such as a branch in.siruction. The conditional 
program flow control instruction targets a first code section 
to be processed if the condition is resolved to be met, and a 
second code section to he processed d" the condition is 
resolved lo be not met. A fetch unit fetches instructions to be 
processed and branch prediction logic coupled to the fetch 
unit predicts the resolution of the condition, 'llie branch 
prediction logic of the invention also determines whether 
resolution of the condition is unlikely to be predicted 
accurately. Stream managemenl logic responsive tc'the 
branch prediction logic directs specula live processing of 
instructions from both the first and .second code sections 
prior to resolution of the condition if resolution of the 
condition is unlikely to be predicted accurately. Results of 
properly executed instructions are then committed lo archi- 
tectural state in program order. In this manner, the invention 
reduces the performance penalty related to mispredictions. 

29 Claims, 6 Drawing Sheets 
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PROCESSOR AND METHOD FOR pipeline. Several dock cycles may be required be fore ihe 

SPECULATIVELY EXECUTING next useful instruciion compleies execution, and before the 

INSTRUCTIONS FROM MULTIPLE instruction execiilion pipeline produces useful output. 

INSTRUCTION STREAMS INDICATED BY A Because conditional branch instmctions and other similar 

BRANCH INSTRUCTION > program How control instaictions are prevalent in software 

applications code (in some cases, they are encountered as 

BACKGROUND OF THE INVENTION trequenlly as one branch instruciion for every five instruc- 

. tions processed), the cumulative microprocessor perfor- 

1. I'leid ol the Invention ^^^^^^^^ ^^^^^^^ mispredictions can be 

llie invention relates to the field of computer systems, jq significant, even where branch prediction accuracy is rela- 

Specifically, the invention relates to etlicienl processing of ,ivclv high. Previous processors do not provide a means for 

instruction streams which include conditional program How identifying whicti branch instructions are unlikely to be 

control. instructions, such as l:)ranch instnictions. predicted accurately such that mitigating measures may be 

2. Description of Related Art taken. 

Many microproce.ssors employ a technique known as Thus, it is desirable to have a means for reducing or 

hardware pipelining to increase instruction throughput by eliminating the performance penally related to mispredicting 

processing several instructions through ditfereni phases of the outcome of program How control operations to provide 

execution concurrently. To maximize instruciion execution for more efficient instruction execution. Further, it is desir- 

eftk'iency, it is desirable to keep the instniction execution able to have a means for identifying program How control 

pipeline tojll (with an instruction being processed in each 20 instructions that are unlikely to be predicted accurately such 

pipeline stage) as often as .possible such that the pipeline that preventive measures may be selectively utilized for the 

produces useful output every clock cycle. However, when- particular instructions that are unlikely to be predicted 

ever there has been a transfer of program flow control to accurately. 

another section of software code and instructions have been SUMMARY OF TOE INVENTION 

speculatively fetched and processed and it is determined that . ^ 

these instructions should not have been executed, the output ''^ microprocessor for efficient processing of in.struclions 

from the pipeline is not useful. ^" ^ program How including a conditional program lluw 

,^ . , n . 1 • . u control instruction, such as a branch instruction, is 

Exceptions and program How control instructions such as , ., , , , • - , . , 

, . ^ , , , described. Ihe branch in.slruction has a condition to be 

branch instructions, provide examples 01 now ihe program ^ , , ■ r 1 - l 11* 

n . , ,1 1 11 u • . u- u 30 resolved and indicates a hrst code section to be processed it 

now control can be changed. Branch instructions, which . .• • • 111 1.1 

, ... , 1 1 . r the condition is resolved 10 be niei, and a second code 

mav be conditional or unconditional and mav transter pro- . , , r.. 1 1 

-' , , ' , \ section 10 be proces.sed d: the condition is resolved to be noi 

i^ram How control to a precedmg or subsequent code section, .... ^ ■ , 1 .-1 • i »■ u 

^ , r r . . • • u mt;t. Ihe processor includes a letch unit that l:eiches 

are used lor trequently encountered situations where a . , ' , ... , . , , , , 

. . n . t • 1 • 1 instructions, branch prediction lo^ic coupled to the letch 

change in proeram How control IS desired. . , ,• . 1 • i- • * > 

^ f --^ 3? unit thai predicts the resolution ot ihe condition and deter- 

A conditional branch instruction determines instruction ^-^^^ whether resolution of the condhion is unlikelv lo be 

How based on the resolution ol a specihed condition. "11 predicted accurately. The processor also includes stream 

A>B then branch to instruction X" is an example of a management logic re.sponsive to the branch prediction logic 

conditional branch instruction. In this case, it: A>B, program ^ that directs speculative processing of instructions from both 

How control branches to a code section beginnmg with the Hrst and second code sections prior lo resolution of the 

-instruction X, also.referred to as tlie target^ code section. \i condition if re.soIution of the condition is determined 10 be 

A Ls noi greater than B, the instructions sequentially follow-- ~ ^nlikxly to-be predicted accurately.^ _^ 
ine the branch instruction in the program flow, referred to as 

the sequential code section, are to be executed. BRIEF DESCRIP^ON OF UiE DRAWINGS 

Becau.se pipelines in some microprocessors can be many ^5 The invention will be understood more fully from the 

stages deep, conditional branch instructions are often " detailed description given below and from the accompany- 

fetched before the condition specified in the branch insiruc- ing drawings of embodiments of the invention, which, 

lion is resolved. In this case, the processor cannot reliably however, should not be taken to limit the invention to the 

determine whether or not the branch will be taken, and thus, specific embodiments, bui are for explanation and under- 

cannol decide from which code section to fetch subsequent so ^^^^^^^^S o"^y- 

instructions. In many processors, branch prediction logic FIG. 1 is a high-level block diagram of the computer 

operates to predict the outcome of a particular branch system in accordance with one embodiment of the invention, 

instruction based on a predetermined branch prediction FIG. 2 is an illustration of instructions in a program flow 

approach. InsinicLions are then speculatively fetched from as processed by one embodiment of the invention, 

either the target code section or the sequential code section 55 FIG. 3 is a block diagram of the processor organization of 

based on the prediction indicated by the branch prediction one embodiment of the invention. 

FIG. 4 illustrates features of the branch prediction unit of 

Although branch prediction accuracy may be improved or one embodiment of the invention in tabular form, 

tuned by u.sing ditfereni branch prediction algorithms, FIG. 5 illustrates the stream tables of one embodiment of 

mispredictions still occur. By the time a misprediction is 60 the invention. 

identified, many in.struclions from the incorrect code section p,GS. 6Aand 6B illuslraie the method of one embodiment 

may be in various stages ol proces-sing in the instruction Qf invention, 
execution pipeline. On encountering such a misprediction, 

instructions following the mispredicted conditional branch DETAILED DESCRIPIION 01 IHE 

instruction in the pipeline (or multiple pipelines) are Hushed, 6S IN VEN 1 ION 

and instructions from the other, correct code section are A processor and method for efficient processing of 

fetched. Flushing the pipeline creates bubbles or gaps in the instruction streams which include a conditional program 
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How control insiruciion are described. ITie operation of the 
invention is described herein with reference to conditional 
branch instructions for simplicity, although it is understood 
that other types of program How control instructions are 
within the scope of the invention. Further, in the following 5 
description, numerous specific details are set forth, such as 
specific functional blocks, instruction formats, tables, etc., in 
order 10 provide a thorough understanding of the invention. 
However, it will be appreciated by one skilled in the art that 
the invention may he practiced without these specific details. .^^^ 
In other instances, well-known structures, circuit blocks, and 
architectural functions have not been described in detail in 
order to avoid obscuring the invention. 
Program Row Control Instructions and Nomenclature 

FIG. 2 illustrates, in tabular form, an example of several 
instructions in a program How to clarify terminology used in 
the following description. The instructions in the program 
How are divided into code sections. The term "code section" 
is u.sed herein to refer to a block of code between two 
program How control instructions, and which includes the 
second of the two program How control instructions as its 20 
last instruction. 

ITie code section including a particular program How 
control instruction being evaluated at a particular time, is 
referred to herein as the parent code section. The code 
section 205 immediately following instruction 12 and the 25 
code section 207 beginning with instruction 2000 which is 
to be executed if the branch is resolved to be taken are 
referred to as child code sections. The labels "parent" and 
"child" depend on the relationship between the code section 
being evaluated and other code sections. Thus, the- parent 30 
code section 203 may also be a child code section relative to 
one or more other code sections (not shown). 

ITie code section 205 is referred to as the sequential child 
code section in relation to the code section 203, as it 
sequentially follows its parent code section 203. Similarly, 35 
the code section 207 is referred to as the target child code 
section in relation to the code section 203, as the first 
instruction of the code section 207 is the branch target 
iastruction, instruction 2000, of the conditional branch 
instruction 12 in the parent code section 203. io 

An in.struction stream as the term is used herein is a flow 
of iasiruciioasnncluding one or-more code-sectionS:-An- - 
instruction stream including the target child code section 
207 is a target instruction stream, and an instruction stream 
including the sequential child code section 205 in relation to ^5 
a particular branch instruction is a sequential instruction 
stream. A.s each conditional branch instruction is resolved, 
the microprocessor identifies the correct code section (or 
instruction stream including the code section) to be com- 
mitted to architectural state. For example, if the branch is 50 
resolved to be taken, the results of execution from the target 
child code section are committed to the processor state. 

It should also be noied that, while the two code sections 
indicated by a conditional program How control instruction 
are identified as target and sequential code sections which 55 
are part of target and sequential instruction streams, it is 
possible that a conditional program How control instruction 
will indicate twx) or more target code sections neither of 
which sequentially follows the program flow control instruc- 
tion. It will be appreciated by those of skill in the art that 60 
although sequential code sections and sequential instruction 
streams are referenced below for exemplary purposes, each 
such reference is equally applicable to an alternate non- 
sequential target code section or itistruclion stream as well. 
Overview of the Invention 65 

'Hie invention provides for eflkieni execution of instruc- 
tions in a program flow including a branch instruction 



having a condition to be resolved. The processor of the 
invention includes means for predicting the resolution of the 
condition and thus, the conditional branch instruction. The 
processor of the invention also includes a means for iden- 
tifying conditions, and thus, conditional branch instructions 
which are unlikely to be predicted accurately. In other 
words, the invention identifies branch in.struciions which, in 
relationship to other conditional branch instructions, have a 
relatively high likelihood of being mi.spredicted. In one 
embodiment, once a condition in a branch instruction is 
identified as being unlikely to be predicted accurately, if 
sutlkient processor front -end resources are available, the 
processor of the invention fetches and decodes instructions 
from both the target and sequential (or second target) 
.instruction streams indicated by the conditional branch 
instruction. 

Then, if proces.sor back-end re.sources are available, 
instructions from both the target and sequential instruction 
streams (which have been fetched and decoded) are for- 
warded to the processor back-end for processing. Execution 
resources of the processor back-end are shared between the 
multiple instruction streams providing for their concurrent 
execution. Once the condition of the conditional branch 
instruction is resolved, instructions from the incorrect 
stream are canceled while instructions from the correct 
instruction stream continue through any additional process- 
ing stages. Valid, executed instructions are subsequently 
retired and committed to architectural stale. In this manner, 
the performance penally incurred for branch mispredictions 
is significantly reduced if not eliminated. 

In one embodiment, once instructions from both target 
and sequential child instruction streams indicated by a 
branch instruction have been fetched and decoded, if .suffi- 
cient processor back-end re.sources are not available for 
concurrent processing of both instructions streams, the 
instructions from the instruction stream predicted to be taken 
are forwarded to the proce.ssor back-end for t\irther process- 
ing. Instructions from the other instruction stream which 
have been fetched and decoded are placed in an instruction 
bufler for temporary storage. In this case, if resolution of the 
conditional branch instruction was mispredicted, only the 
back-end-ot-the .processor_pipeline, is Hushed insjead of th^e 
entire processor pipeline. Instructions from the correct 
instruction stream are then immediately available in the 
instruction buffer for continued processing in the proces.sor 
back-end such that the performance penalty incurred for a 
misprediction is significantly reduced. 

In an alternate embodiment, if a condition specified in a 
branch in.struction is identified as being unlikely to be 
predicted accurately, availability of resources acro.ss the 
processor pipeline is evaluated only once (as diflerentiated 
from the embodiment described above where availability of 
sufficient processor front -end resources is assessed prior to 
forking and then availability of processor back-end 
resources is assessed once the instructions have been fetched 
and decoded). If suflicient processing resources are deter- 
mined to be available, instructions from both the target and 
sequential instruction streams are fetched and processed 
until the branch in.struction indicating the streams is resolved 
and the incorrect instruction stream is canceled. 
Overview of the Computer System of the Invention 

Referring to FIG. 1, a computer system in accordance 
wiih one embodiment of the invention is illustrated. The 
computer system of the invention includes a system bus 100 
for communicating information, a proce.ssor 101 coupled to 
the bus 100 for processing information, a random access 
memory (RAM) 102, also referred to as system memory or 
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main memory, coupled to the processor 10 1 for storing 
information and instructions for the processor lOl, and a 
read only memory (ROM) 103 or other non -volatile storage 
device coupled to the bus 100 for storing lixcd information 
and instructions for the processor 101. The computer system 5 
of the invention also includes an external cache memory 106 
for storing frequently and/or receaily used information for 
the processor 101. The cache memory 106 may be config- 
ured within the same integrated circuit device package as the 
processor 101. or in a separate device package. Other lO 
components such as a mass storage device 108, a display 
device 110 such as a printer or monitor, a keyboard 112 or 
other input device and a cursor control device 114 may aLso 
be included in the computer system of the invention. 

In one embodiment of the invention, the processor 101 is :i5 
an Intel Architecture microprocessor such as is manufac- 
tured by Intel Corporation of Santa Clara, California. Other 
processor architectures may also be used in accordance with 
the invention. Further, it is appreciated by those skilled in 
the art that other computer systems including additional 20 
components not illustrated in FIG. 1, or configured without 
components that are illustrated in FIG. 1 may also be used 
to practice the invention. 

The l^rocessor of One Embodiment of the Invention 

FIG. 3 illustrates a block diagram of the processor 101 of 2? 
one embodiment of the invention. 1'he processor 101, 
includes an insi ruction pointer (IP) 302 for indicating the 
address of instructions currently being fetched. A second 
instruction pointer 303 is also included in one embodiment 
for managing a .second stream of instructions as described in 30 
more detail below. A different number of instruction pointers 
is included in other embodiments to manage concurrent 
proce-ssing of a larger or smaller number of instruction 
streams. 

Fetch unit 304 has inputs coupled to the instruction 35 
pointers 302 and 303 and operates to speculatively fetch 
instructions from an iastruction cache memory 301, or from 
a next level of memory, such as the main memory 102 of 
FIG. 1 if the instructions are not available in the iastruction 
cache memory 301. In one embodiment, the instruction 40 
cache memory 301 includes two read ports such that two 

~ fetch-operations may be performed concurrently. Two-virtual 

read ports may be provided by supporting two interleaved 
cache memory banks, each with one physical read port. In an 
alternate embodiment, the instruction cache memory 301 45 
includes a multiplexed read port such that the cache memory 
can alternate Ixlween multiple read access request .sources. 

In one embodiment, a decode unit 306 is coupled to 
decode instructions received from the fetch unit 304 into 
operations which are more easily processed by the proce.ssor 50 
101. One or more instruction buffers 307 are coupled to 
receive instructions which have been fetched and decoded. 
The instruction buffer 307 operates lo temporarily store 
instructions from one or both of the target or sequential child 
iastruction streams indicated by a conditional branch 55 
instruction as discussed in more detail below. Each of the 
above units along with .stream management logic 109 is 
included in the processor front -end 300 in one embodiment. 
In alternate embodiments, the proces.sor front- and back- 
ends may include different components or the proce.ssor may 60 
not be arranged in this manner. 

The processor back-end 318 includes a renamer/ 
allocation unit 308 in one embodiment which is coupled to 
the decode unit 306 and the instruction buffer 307. The 
renamer/allocation unit 308 operates to perform register 65 
renaming and allocation functions. This includes allocation 
of .space in a re -order buffer (within retirement and write- 



back logic 326) for instruction retirement in embodiments 
providing for dynamic execution or out-of-order instruction 
processing. In allernaie embodiments, the proce.s.sor of the 
invention does not include a rename unit. 

A micro -operation queue 310 is coupled to the renamer/ 
allocation unit 308 in one embodiment, and operates to 
dispatch micro-operations lo execution logic 320 which may 
contain ALUs, shifters, multipliers, and even data cache 
memories to execute load operations. Logic blocks, such as 
those discussed above, with the exception of the instruction 
buffer and the instruction cache memory 301 are considered 
to be pan of the instruction execution pipeline, as they 
perform various operations in the in.struction execution 
process. It should be noted that, in other embodimeni.s, the 
processor of the invention may not include all of the above 
functional blocks. For example, the proces.sor may not 
operate on micro-operations and thus, certain functional 
units related to micro-operations may not be included or 
may perform different functions. 

The back-end 31S of one embodiment also includes logic 
blocks such as registers 324 for temporary information 
storage, a data cache memory 328, and the retirement and 
write back logic 326 for retiring inst met ions and committing 
the results of properly executed instaictions to architectural 
state. The back-end 318 of one embodiraeni also includes 
additional registers or buffers 322 for .storing data, (lags 
and./or coniexi information for the processtjr 1.01. 

THE STREAM MANAGEMENT LOGIC OF ONE 
E.MBODIMENT 

The proce.ssor UU of the invention also includes stream 
management logic 109 for managing processing of one or 
more instruction .streams concurrently in the instruction 
pipeline of the processor 101. The stream management logic 
109 of one embodiment includes branch processing and 
prediction logic 316 and stream control ■ logic 314. The 
stream management logic 109 is coupled to logic blocks in 
the instruction pipeline of processor 101 such as the fetch 
304, decode 306, and renamer 308 units. The stream man- 
agement logic 109 is also coupled to the instruction pointer 
302 and logic blocks in the back-end 318, such as the 
■retirement and writeback logic 326- and-t he buffers. 322 

The branch processing and prediction logic 316 of one 
embodiment of the invention includes a branch prediction 
unit 336 and branch processing control logic 338. The 
branch processing and prediction logic 316 of the invention 
performs a number of different functions including branch 
prediction, and branch history maintenance in .some embodi- 
ments. Tlie branch processing control logic 338 controls the 
execution of branch instructions and provides the control 
functions nece.ssary for the operation of the branch predic- 
tion unit 336. 

The Branch Prediction Unit of One Embodiment 

FIG. 4 illustrates the branch prediction unit 336 of one 
embodiment of the invention. Although specilic lie Ids and 
types of information are illustrated and described in refer- 
ence to FIG. 4, it is appreciated by those of ordinary skill in 
the art that other fields including different types of informa- 
tion are also within the scope of the invention. Further, in the 
example illustrated in FIG. 4, several fields which may be 
included in the branch prediction unit 336 such as a tag field, 
and a valid field, for example, are well-known in the an and 
have not been shown in order to avoid ob.scuring the 
invention. 

In the embodiment illustrated in FIG. 4, branch prediction 
unit entries are indexed according to an instruction address 
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although other contigu ratio as are within the scope of the 
invention. The branch prediction unit 336 includes a pre- 
diction field 410 which indicates whether il is more hkely 
that the branch indicated by the branch instruction being 
cvaUiaied will be resolved lo be taken, or will be resolved to 
be not taken. It is appreciated by those of ordinary skill in 
the art that any one or a combination of branch prediction 
approaches in ay be used in accordance with the invention lo 
determine the prediction information to be stored in the 
branch prediction field 410. 

ITie branch prediction unit 336 also includes a field 412 
which indicates whether the resolution of the branch instruc- 
tion referenced by the particular entry in the branch predic- 
tion unit is unlikely to be predicted accurately. A branch 
instruction which is identified as being unlikely to be 
predicted accurately is considered difficult to predict 
accurately, or more likely to be mispredicted than some 
other branch instructions as discussed in more detail below. 
It is appreciated by those of skill in the art that it is resolution 
of a condition associated with the branch insiiiiction that is 
predicted and identified as being unlikely to be predicted 
accurately for branch instructions which are identified as 
being unlikely to be predicted accurately. 

llie information stored in the field 412 assists in branch 
processing by identifying conditional branch instructions 
which are worthwhile to fork to avoid a potential branch 
misprediction penally. In other words, the invention 
determines, based in part on the information stored in the 
field 412, when it is an efficient use of microprocessor 
resources to execute instructions from both the target 
instruction stream and the sequential instruction stream. 

Processing both inslruction streams only if a conditional 
branch instruct ion is identified as being unlike fy to be 
predicted accurately provides an important advantage. For 
example, in some cases, conditional program flow control 
instructions, such as branch instructions, may be encoun- 
tered as frequeniiy as one out of every five instructions. 
As.suming, for purpo.ses of illustration, that a processor 
includes 10 pipeline stages with each stage containing four 
in.structions on average, and that one of every five instruc- 
tions is a conditional brancfi instruction. If the processor 
._were_to fetch and _execule_inst ructions from boLh the target 
and sequential instruction streams of every conditional 
branch instruction encountered, the processor 101 would 
need to provide resources to execute a very large number of 
streams concurrently considering the 40 stage pipeline. In 
this case, the inslruction execution pipeline will become 
highly inefficient and will quickly run out of resources. To 
support concurrent processing of such a large number of 
instruciion streams significant additional inslruction pro- 
cessing resources would need to be added to the processor. 
By identifying the conditional branch instructions which are 
unlikely to be predicted accurately, and only forking to 
process both instruction streams indicated by the branch 
instruciion if the branch instruction is so identified, the 
invention reduces or eUminates performance penalties 
which may be incurred by branch mispredictions without 
significantly increasing the hardware resources required for 
efficient processing. 

'Ilie determination of whether a branch instruction is 
unlikely to be predicted accurately, and thus, the determi- 
nation of the information to be stored in itie field 412, may 
be based on many different factors and may be determined 
using a number of different approaches. In some 
embodiments, ihe information con.sidered to determine 
whether a branch instruction is unlikely to be predicted 
accurately, is the .same as, or similar lo the information used 
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to determine the predicted resolution of the L-jranch instruc- 
tion. For example, in one embodiment, conditional branch 
instructions are identified as being unlikely to be predicted 
accurately if the particular branch instruction has not been 

5 encountered before. In this case, the branch iasl rue lion is 
identified as being unlikely to be predicted accurately based 
on the fact that there is no entry in the branch prediction unit 
336 for this instruction. 

A branch instmction may be coasidered to be unlikely to 
be predicted accurately based on information stored in a 
branch history field such as the field 408. Information stored 
in the branch history field of one embodiment indicates how 
many times the resolution of the branch instruction was 
mLspredictcd and/or how the branch instruction was resolved 
out of the last number of times the branch inslruction was 
encountered. The branch history field 408 may include only 
one bit providing information regarding the last time the 
branch inslruction was fetched, or several bits of informa- 
tion providing information for many previous encounters 

^ with the particutar branch instruction. Information regarding 
the resolution of each branch instruction is communicated 
from the back-end 318 to the branch proce.ssing and predic- 
tion logic 316 over a bus 340 (shown in FIG. 3) in order lo 
update the branch history each lime a branch or other 
program flow control instmction is resolved. 

Using the branch history field 408, in one embodiment, all 
conditional branch instructions which were mispredicted the 
last time they were encountered are identified as being 
unlikely to be predicted accurately. In embodiments in 
which a more detailed branch history is kept, a branch 
inslruction may be identified as being unlikely to be pre- 
dicted accurately if the branch instruciion was mispredicted 
x out of the last y times it was executed, or, in other words, 
the misprediction rate is higher than a predetermined per- 

... cent age. 

The processor 101 may also u.se additional information in 
determining that a particular conditional branch inslruction, 
or type of conditional branch in.sl ruction, is unlikely to be 
predicted accurately. In one embodiment, additional Hags 
and context information are stored in the branch prediction 
unit 336 in other fields (not shown) to provide more infor- 
mation about the branch instruction and surrounding inslruc- 
ions l>ri which" 10 base such a determination: For "example, 
in one embodiment, all branch instructions associated with 

4^ a server application may be considered to be unlikely to be 
predicted accurately based on their context. 

In another embodiment, the invention includes a counter 
350 coupled to the stream management logic 109 which 
counts a number of microprocessor clock cycles since the 

50 particular branch instruction was last mispredicted. If the 
number of clock cycles since the particular branch instruc- 
tion was last mispredicted Ls above a certain number, the 
branch instruction is identified as being unlikely to be 
predicted accurately. 

55 Bach of the above approaches may be used alone or in 
combination with another approach to identify a particular 
branch inslruction as being unlikely to be predicted accu- 
rately. The criteria used to identify a branch instruction that 
is unlikely to be predicted accurately may be selected based 

60 on the most commonly u.sed software for a particular com- 
puter system, characteristics common to software in general, 
or can be directed toward a specific set of software programs 
tor which the processor is targeted. In this manner, the 
invention provides the flexibility to meet general or specific 

(55 performance improvement requirements. 

While tfie branch prediction unit 336 of the invention is 
ilhistrated as one table including multiple fields, it is appre- 
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ciaied by those of ordinary skill in the an thai ihe branch 
predict ion unil 336 may include multiple bull'ers or other 
logic providing for storage and processing of similar infor- 
mation. 

'ITie Stream Control Logic of One Embodiment > 

Referring back to FIG. 3, when a conditional branch 
instruction is fetched, if the branch prediction unit 336 
indicates that the conditional branch instruction is unlikely 
to be predicted accurately, or information regarding the 
branch instruction is not available in the branch prediction 
unit 336, the fetch unit 304 of the processor 101 specula- 
lively fetches instructions from both the target and seqiten- 
tial child code sections if suflicient additional processor 
front -end resources are available for front -end processing of 
both instruction streams. The stream control logic 314 of the 
invention controls the processing of the multiple instruction 
streams including the target and sequential child code 
.sections, until the condition within the branch instruction is 
resolved, and the correct instruction stream including the 
correct code seciion(s) is identified. The .stream control logic " 
314 operates to keep instructions (lowing through the 
instruction proce.ssing pipeline of the processor 101, without 
requiring a .significant increase in iastaiction pioce.ssing 
resources to provide for the concurrent processing of mul- 
tiple instruction .streams. 

In one embodiment, the .stream control logic 314 includes 
one or more stream tables 330. The stream table 330 
operates to keep track of the multiple instruction pointers 
used to control processing of multiple instruction .streams in 
the processor pipeline. A predetermined number of instruc- 
tion pointers is available to be associated with instruction 
streams being processed in the instaiction execution pipe- 
line. In the embodiment illustrated in FIG. 3, two instruction 
pointers are included- although additional instruction point- 
crs may be provided such that more than two instruction 
streams may be processed concurrently. The number of 
instruction pointers available is one factor in determining the 
maximum number of instruction .streams which can be 
"alive," or in process within the instruction proce.s.sing 
pipeline at one time in one embodiment. Thus, the number 
oj" instruction pointers available to be associated with 
instruction St reams can be selected by the^processor clesignef' " 
to meet performance and hardware resource requirements of 
a specific computer system in accordance with the invention. ^„ 

In one embodiment, a new instruction stream is spawned 
each time a conditional branch instruction is fetched and 
identified as being unlikely to be predicted accurately 
(assuming resources are available as discus.sed in more 
detail below). In one embodiment, the newly spawned 
instruction stream includes the code section (either the target 
child code .section or the sequential child code section) 
which is predicted not to be executed. ITie code section 
which is predicted to be executed is part of the instruction 
stream which includes the parent code section, and is already 
as.sociated with an in.st ruction pointer to track its processing 
in the pipeline. 

The stream table 330 of one embodiment, is described in 
more detail with reference to FIG. 5. The information in the 
field 502 indicates the instruction pointer (IP) number or 
otherwise identifies the particular instruction pointer refer- 
enced in that particular entry of the .stream table 330. In 
alternate embodimeni.s, the stream table 330 does not 
include a field 502 for the IP number, and ihe IP number is 
determined instead by its location in the stream table 330. 

ITie stream table 330 also includes a field 504 for storing 
a tag which uniquely identifies the instruction stream asso- 
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ciated with a particular instruction pointer such that the 
instruction .stream can be properly managed in the instruc- 
tion execution pipeline. In a simple case in which two 
instruction streams are being proce.ssed in the processor 
pipeline concurrently, the tag field 504 stores a simple tag 
identifying one instruction .stream associated with a firsl 
instruction pointer as the .sequential instruction stream indi- 
cated by the branch in.struction and the other instruction 
stream a.s.sociated with a second instruction pointer as the 
target instruction stream indicated by the branch in.st ruction. 
ITien, when the branch instruction is resolved, the informa- 
tion ill the tag field 504 identifies which instruction stream 
will continue processing and which will be canceled. 

In embodiments providing for concurrent execution of a 
larger number of instruction streams, the tag 504 may 
include several sub-fields providing additional information 
for each instruction stream. This additional information may 
include for example an address identifying the parent branch 
instruction, whether the instruction stream a.s.sociated with 
the particular instruction pointer has spawned any child 
instruction streams, and possibly the instruction pointers 
a.ssociated with any child instruction streams such that 
concurrent execution of multiple instruction streams may be 
properly managed. 

The stream table 330 may also include other fields 506 
such as a field indicating whether the entry is valid and/or a 
field indicating whether the insiruciion pointer is still alive 
or whether the instruction .stream a.ssociated with the instruc- 
tion pointer has been canceled. In this manner, the stream 
table 330 is used during the instruction retirement process in 
one embodiment to identify the instructions to be committed 
to proces.sor state. 

Apriority field 516 is also included in one embodiment 
and is di.scussed in more detail below in the section regard- 
ing management of processor resources. 

The stream control logic 314 also includes control logic 
332 in one embodiment. Control logic 332 performs many 
functions including managing storage of information in the 
stream table 330. invalidating entries in the .stream table 330 
when program flow control in.st ructions are resolved, deter- 
mining stream control logic and processor resource 
availabiliry, ahcl'difcctihg the use of hardware'~resources^to 
provide for concurrent proces.sing of multiple instruction 
streams a.s di.scussed below. 

If the program flow control instruction of the code section 
is a conditional branch instruction which is unresolved, the 
branch prediction unit 336 helps in determining the -next 
instructions to be fetched. If the particular branch instruction 
is identified as being unlikely to be predicted accurately as 
discu.ssed above, and sufficient processor resources are 
available, instructions from both the target and .sequential 
child code sections will be fetched and at least partially 
proce.ssed until the branch instruction is resolved. 

In one embodiment, suflicient additional front-end pro- 
ce.ssor re.sources are available for concurrent processing of 
instruction streams including both the sequential and target 
child code sections if one additional IP is available a.s 
indicated by information stored in the stream table 330. In 
one embodiment, the control logic 332 determines the 
availability of IPs in the stream table 330. In one 
embodiment, this is the only point at which an as.sessment of 
availability of processor resources that is made. Once it is 
determined that sufficient proce.ssor resources are available, 
concurrent processing of the target and sequential instruc- 
tion streams proceeds through the processor pipeline until 
the conditional branch instruction is resolved. 
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In another embodimeni, a second assessment of the avail- 
ability of processor resources is made by the control logic 
332 once multiple instruction streams have been spawned, 
and instructions from both streams arc in the process of 
being fetched and decoded. Processor back-end resource 5 
availability is determined and instructions from both target 
and sequential instruction streams are forwarded to the 
back-end only if sufficient additional processor resources arc 
available for concurrent execution of both instruction 
streams. One method of determining whether sufficient 
back-end resources are available is by counting the number 
of outstanding memory accesses already issued, or counting 
the number of entries allocated into various execution and 
retirement buffers, if sufficient back-end resources are no! 
available, only the instruction stream predicted to be • 
executed is forwarded to the back-end for liirlher processing 
and the other instruction stream is temporarily stored in the 
instruction bulfer 307 of FIG. 3 and thus, available for 
immediate processing if a misprediction is later identified. 

Referring back to the stream table 33(1 of FIG. 5, if 
forking is indicated (the branch instruction being evaluated 20 
is unlikely to be predicted accurately and adequate resources 
are available), the stream table 330 is updated for the new IF 
entry as discussed above with an appropriate tag for the new 
instruction stream. 

^Fhe stream labie 330 may also be updated with the 25 
appropriate tag if instructions are fetched from only the code 
section predicted to be processed. Instructions are fetched 
.only from the code section predicted to be processed if a 
branch instruction is not identified as being unlikely to be 
predicted accurately or if adequate resources are not avail- ;q 
able as discussed above. 

In one embodiment, if instructions from both the target 
and sequential child code sections indicated by a particular 
branch instruction are speculatively fetched and specula- 
lively processed as discussed above, processing of both code 35 
sections continues until the condition indicated by the 
branch iastruclion is resolved. Proce.ssing of .speculatively 
fetched instructions from both the target and sequential child 
code sections or instruct ioti streams may proceed through all 
aspects of instruction processing, including execution, and /q 
up to, but not including rehremeni. 

In. one_embodimeni,_ once„the branch in.structjon_is 
resolved, processing of instructions from the "incorrect" 
code section is aborted without regard to the processing 
stage of the mstructions, and the entries m the stream (able 4^ 
330 corresponding to ihese code sections are invalidated by 
the control logic 332. In other embodiments, the instructions 
from both the target and sequential child code sections are 
processed through all proce.ssing stages prior to retirement, 
and then only instructions from the "correct" instruction 50 
stream including the correct code section are retired and 
committed to architectural stale. ITie IP associated with the 
incorrect instruction stream is then freed up" for u.se by 
another code section. The IP may be freed up by indicating 
that the entry is invalid, for example. Also in one 55 
embodimeni, once the conditional branch instruction is 
resolved, the incorrect instruction stream is identiiied as not 
being live any longer and no additional instructions from 
that particular instruction stream are fetched. 

Once instruction execution is completed, instructions in do 
the program 11 ow^ are retired in program order. Results 
produced by speculatively executed instructions from a code 
section identified as being an "incorrect" code section, and 
thus, itivalid, are not committed to the processor stale. In this 
manner, the speculatively executed instmclions from the 65 
incorrect code section do not alfect the archiieciural stale of 
the processor. 



By executing instructions from instruction streams 
including both the target and sequential child code sections 
for conditional branch instructions identiiied as being 
unlikely to be predicted accurately, the invention reduces or 
eliminates the performance penalty a.ssociaied with branch 
mLsprediciions. In accordance with the invention, once the 
conditional branch instruction is resolved and the correct 
code section is identified, further processing of the iiicoriecl 
code section is aborted. Instructions from the correct code 
section are already in process in the pipeline and instructions 
from the incorrect code section which are in process in the 
pipeline are invalidated. In this manner, the delay cau.sed by 
flushing the pipeline upon identifying a misprediction is 
avoided for conditional branch in.structious identified as 
being unlikely to be predicted accurately. 

Management of Instruction Processing Resources 

The instruction processing resources of the instruction 
execution pipeline of the processor lOl are managed, as 
discussed above, 10 provide for the concurrcni execution of 
multiple instruction streams without requiring a proportional 
increase in insimciion processing hardware resources, in 
one embodiment of the invention, the processor instruction 
execution resources are not increased beyond the resources 
required to execute a single instruction stream along pre- 
dicted paths of branch instaiction.s. In other embodiments, 
particular stages of the instruction execution pipeline are 
duplicated, or hardware resources in a particular unit are 
increased to provide additional performance enhancements. 
For example, the execution logic 320 of one embodiment 
includes a sea of execution units such thai several instruc- 
tions may be processed through the instruction execution 
phase of the pipeline concurrently. The determination of 
hardware resource requirements in one embodiment, is 
based on space, cosi, and performance trade-offs. 

[n one embodiment, processing of multiple streams is 
time multiplexed such that the instruction processing 
resources can be etfeclively shared. Hardware resources are 
thu.s, alternately dedicated to the various instruction streams 
• in process in the instruction processing pipeline such that the 
instruction streams proceed through the pipeline stages in 
parallel. Time-muJiiplexing of the hardware instruction pro- 
cessing resources is managed by the control logic 332 in one 
em bod ime n t r Any n u m be'r of " approaches rri ay"be u sed' 10" 
implement time-multiplexing in accordance with the present 
invention. 

In another embodiment, the stream table 330 includes an 
additional field 5.16. for storing information indicating a 
relative priority of a particular live code section in relation 
to other code .sections being processed in the instruction 
execution pipeline. In one embodiment, where instructions 
are fetched from both the target and sequential child code 
sections indicated by a particular conditional branch 
instruction, the child code section predicted to be executed 
by the branch prediction logic 336, is identified as having a 
higher priority than the other child code section. For 
example, if the branch prediction unit 336 predicts that the 
branch being evaluated will be taken (even though the 
branch prediction unit 336 also indicates that resolution of 
the f")ranch instruction is unlikely to be predicted accurately), 
the target child code section is identified as having a higher 
priority than the sequential child code section. In this 
manner, more of the instruction processing resources in the 
instruction execution pipeline may be directed to the higher 
priority instruction .stream including the higher priority code 
section. 

In an alternate embodiment, the priority indicator associ- 
ated with a particular code section is monitored by the 



10/31/2003, EAST version: 1.4.1 



5,860,017 



13 



14 



contro! logic 332 during processing. Additional information 
available during processing of the program may be used in 
some embodiments to dynamically adjust the priority 
assigned to a particular code section to respond to changing 
events. For example, a condition affecting the predicted 
resolution of an unresolved conditional branch instruction 
may cause the control logic 332 to switch the relative 
priority of a taigct ctiild code section with that of a sequen- 
tial child code section. 

It will be appreciated by those of ordinary skill in the art 
that oiher approaches for sharing instruction proce.ssing 
resources are also within the scope of the invention. Thus, 
the invention provides for efficient processing of instructions 
in a program flow including conditional program flow 
control instructions. By identifying conditional branch or 
other program flow control instructions that are unlikely to 
be predicted accurately, and executing down both the 
sequential and target instruction .streams, the invention 
reduces or eliminates the perfonmance penahy due to branch 
mispredictions. Further, the invention manages concurreni 
execution of multiple streams to reliably ma in lain the pro- 
cessor stale, and to provide for eflicient processing of 
multiple in.structions streams concurrently without a .signifl- 
canl increase in hardware resources. The invention thus 
provides for significant instruction processing performance 
increa.ses. 

One Embodiment of the Method of the Invention 

TTie operation of ihe invention is fun her clarified with 
reference to FIGS. 6 A and B which illu.strate the method of 
one embodiment of the invention beginning in processing 
block 602. At step 604, an instruction pointer (IP) is updated, 
and at steps 606 and 608, which may be performed in 
parallel in some embodiments, the instruction or instructions 
indicated by the IF are fetched, and the branch prediction 
table is referenced for branch prediction information where 
appropriate. At this point, if the instruction is a branch 
instruction, and the branch prediction table includes an entry 
for the branch instruction, it will indicate whether the branch 
is predicted to be taken or not taken. At stop 610, it is 
determined whether the fetched instruction(s) is a branch 
instruction. Iliis determination may be the result of the 
instruction decoder or it could be a guess by the machine 
ba .sed o n t he addr ess of the inst ruction. If it is not a branch _ 
instruction, the instruction is executed at step 627, and is 
processed through the remainder of the instaiction process- 
ing pipeline. 

If the instruction is a branch instruction, then at step 612, 
it is determined whether the branch instruction is unlikely to 
be predicted accurately. Step 612 may alternately be com- 
pleted at the time that the branch prediction logic is refer- 
enced at .step 608. If the branch instruction does not have a 
high likelihood of being mispredicted as indicated by the 
branch prediction logic, then at .step 615, instructions are 
fetched from the code section predicted by the branch 
prediction logic to be executed, processed by the processor 
front -end and forwarded to the processor back-end for 
further processing in step 616. Processing of the instructions 
then continues at step 627. 

If the branch instruction is identified as being unlikely to 
be predicted accurately, at step 612, then at step 614 in one 
etnbodiment, it is determined whether adequate additional 
processor front-end resources are available to j3rocess 
instructions from both the target and sequential child code 
sections. If adequate processor front-end resources are not 
available, then in step 615, instruclioiis are fetched from the 
predicted stream only, the fetched instructions are processed 
by the processor front -end and in step 616, are forwarded to 
the processor back-end. Processing continues at step 627. 



Referring back to decision block 614, if adequate proces- 
sor front-end resources are available, then in step 617, 
instructions are fetched from both the target and .sequential 
instruction streams which are both then proces.sed by the 
5 processor front -end. In one embodiment, this processing 
includes decoding the instructions and temporarily storing 
one or both of the instruction streams in an instruction buffer 
as described above. 

At decision block 618, the availability of additional 
10 proce.s.sor back-end resources is determined. If adequate 
additional processor back-end resources are not available to 
process both target and sequential child instruction streams, 
then in step 616, only the instruction stream predicted to be 
executed is forw^arded to the processor back-end and pro- 
is cessing continues at step 627, Referring back to decision 
block 618, if adequate proce.s.sor back-end resources are 
available to process both the target and sequential child 
instruction streams, then both streams are forwarded to the 
back-end either directly or from the instruction buffer and 
20 processing continues in step 619. In one embodiment, 
adequate resources are available if one instruction pointer is 
available and two section names are available as indicated 
by the stream table. 

In an alternate embodiment, when a branch instruction is 
25 encountered, overall proces.sor resources are evaluated to 
determine whether to fork to fetch down both the target and 
sequential child instniction streams. In this embodiment, 
processing of both instruction streams continues until the 
branch instruction spawning the child instruction streams is 
30 resolved. In this manner, evaluation of processor resources 
is not divided into evaluation of "processor front-end 
resources and evaluation of processor back-end resources. 

At step 619, the rename table is copied, or in embodi- 
ments which do not rename registers, the instruction stream 
35 is tagged appropriately. At step 620, the stream table is 
updated to associate a free instruction pointer with the newly 
spawned instruction stream and/or other fields are updated 
as needed. 

At step 627, instructions are executed including specula- 

40 tively fetched instructions which are part of a forked instruc- 
tion stream for which the parent branch instruction has not 

b^e n_ reso K^ed, Jns^r uct ip ns_vv hjch^ jTax^e^b^eji je>ie^iued a re_ 

sent to the retirement buffer in step 628 and the instruction 
retirement process begins at step 629. 

45 The steps following step 629 are performed for each 
instruction to be retired in parallel. At step 630, it is 
determined whether the inslruction(.s) is a speculatively 
fetched instruction. If not, then processing continues at .step 
650 where valid executed instructions are retired. At deci- 

50 sion block 652, if there are more instructions to be retired, 
processing continues back at processing step 630. 
Olhen\'ise, the method ends at .step 654. 

Referring back to decision block 630, if the instriiction(s) 
IS a .speculatively fetched instruction, then at decision block 

55 634, it is determined whether the instruction is from an 
instruction stream that was forked (because the parent 
branch iri.struction was identified as being unlikely to be 
predicted accurately). If not, then at decision block 636, it is 
determined whether the proces.sor mispredicted resolution of 

60 the parent branch in.struction. If not, at step 642, the branch 
history is updated and processing continues at step 650. 
Referring back to decision block 636, if resolution of the 
parent L')ranch instruction was mi.spredicted, in step 638, the 
branch history is updated to reflect the resolution of the 

65 parent branch instruction and in step 644, the pipeline or 
portion of the pipeline is flushed as discussed above. Pro- 
cessing then continues at .step 652. 
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Referring back lo decision block 634, if the instruction is 
part of a chiki instruction stream spawned by a parent branch 
instruction that was forked, then in step 640, the branch 
history is updated. In step 646, the instruction is canceled if 
it is from the incorrect stream and resources dedicated to the 
canceled instniction are reclaimed. In step 650, validly 
executed instructions arc retired in program order and com- 
mitted to architeciiiral state. Ai decision block 652, ii is 
deierrhined whether there are more instructions to retire and 
if not, the method of one embodiment of the invention ends 
at step 654. 

It is understood by tho.se of ordinary skill in the art that in 
a pipelined proces.sor, instructions will be at various stages 
of execution concurrently. Thus, various steps of the method 
of the invention arc also performed concurrently and con- 
tinuously. 

Tlius, a processor and method for etlkieni execution of 
in.sl ructions in a program How is described. Whereas many 
alterations and mod ilicai ions of the invention will be appre- 
ciated by one of ordinary skill in the art after having read the 
foregoing description, it is understood lhat the particular 
embodiments shown and described by way of illustration are 
in no way intended to be considered limiting. Therefore, 
references to details of the individual embodiments are not 
intended to limit the scope of the claims which in 
themselves, recite only those features regarded as essential 
to the invention. 

We claim: 

1. A microprocessor for processing instructions including 
a branch instruction having a condition lo be resolved, a first 
code .section to be proces.se d if the condition is resolved to 
be niei, and a second code section to Ix processed if the 
condition is resolved to be not met, the microprocessor 
comprising: 

a fetch unit for fetching instructions from a memory; 

branch prediction logic coupled to the letch unit that 
predicts the resolution of the condition and determines 
whether the resolution of the condition is unlikely to be 
predicted accurately; and 

stream manage mem logic responsive to the branch pre- 
diction logic that directs speculative processing of 
instnictions-from-both -the fust~and-second -code , sec- 
lions prior to resolution of the condition if the resolu- 
tion of the condition is determined to be unlikely to be 
predicted accurately. 

2. The microprocessor as set forth in claim 1 wherein, if 
the branch prediction logic determines that the resolution of 
the condition is not unlikely to be predicted accurately, the 
stream management logic directs speculative processing of 
the first code section if the prediction logic predicts thai the 
condition will be resolved to be met, and the second code 
section if the prediction logic predicts that the condition will 
be resolved to be not met. 

3. The microprocessor as set forth in claim 1 wherein the 
branch prediction logic includes a bulfer that indicates a 
predicted resolution of a condition for each of a plurality of 
branch instructions and an address of each branch instruc- 
tion for which a predicted resolution is indicated, and 
wherein the branch prediction logic determines that resolu- 
tion of the condition is unlikely lo be predicted accurately if 
the address of the branch instruction is not indicated in the 
buftier at a time lhat the branch instruction is fetched. 

4. The microprocessor as set forth in claim 1 wherein the 
branch prediction logic further indicates whether resolution 
of I he condition was mispredicted a last lime the branch 
instruction was fetched, and wherein the branch prediction 
logic determines that resolution of the condition Ls unlikely 
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10 be predicted accurately when resolution of the condition 
was mispredicted the last time the branch iastruction was 
fetched. 

5. The microprocessor as .set forth in claim 1 wherein the 
branch prediction logic tiirther indicates a number of limes 
the resolution of the condition has been mispredicted out of 
a number of times the branch instruction has been fetched, 
and wherein the branch prediction logic deterrriines lhat 
resolution of the condition is unlikely to be predicted 
accurately if the resolution of the condition has been mispre- 
dicted a predetermined percentage of the number of limes 
the branch instruction has been fetched. 

6. The microprocessor as .set forth in claim 1. wherein the 
branch in.struciion is in a program and the branch prediction 
logic determines whether the resolution of the condition is 
unlikely to be predicted accurately based on a context of the. 
branch instruction in the program. 

7. The microprocessor as set forih in claim 1 further 
including a counter coupled to the stream managemeni logic 
that counts a number of clock cycles since the branch 
prediction logic last mispredicted the resolution of a 
condition, and wherein the branch prediction logic deter- 
mines that the resolution of the condition is unlikely to be 
predicted accurately if the counter is greater than a first 
predetermined number. 

S. The microprocessor as set forth in claim 1 wherein the 
stream management logic tracks concurrent processing of a 
plurality of instruction streams up to a first predetermined 
maximum number of instruction streanris, each instruction 
stream including a plurality of code sections. 

9. The microprocessor as .set forth in claim 8 wherein the 
stream management logic is further responsive to availabil- 
ity of processor resources, the stream management logic 
directing speculative processing of iastructions from both 
the first and second code sections prior to resolution of the 
condition if a number of instruction streams being concur- 
rently proces.sed is one less than the first predetermined 
maximum number 

10. The microprocessor as set forth in claim 1 wherein the 
stream management logic indicates a priority for .speculative 
processing of each of the first and .second code .sections, the 
priority of the first code .section being higher than the 
priority of the .second code .section if the branch prediction 
logic"predicls"that-the condition will Ix- resolved toT>e met,- 
the priority of the first cade section being lower than the 
priority of the .second code section if the branch prediction 
logic predicts that the condition will be resolved to be not 
met. 

U . In a microprocessor, a method for processing instruc- 
tions in a program including a branch instruction having a 
condition to be resolved, a first code section to be processed 
if the condition is resolved to be met, and a .second code 
section to be processed if the condition is resolved to be not 
met, the method compri.sing the steps of: 

fetching the branch instruction; 

predicting whether the condition will be resolved lo be 
met; 

determining whether the resolution of the condition is 
unlikely to be predicted accurately; and 

forking the program How to speculatively process instruc- 
tions from both the first and second code sections if the 
resolution of the condition is determined to be unlikely 
to be predicted accurately. 

12. The method as .sei forth in claim 1 1 further including 
the following steps if the resolution of the condition is 
determined not to be unlikely to be predicied accurately: 

speculatively processing the first code .section if the 
condition is predicied to be resolved to be met; and 
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Specula lively processing the second code section if the 
condition is predicted to be resolved to be not met. 

13. The method as set forth in claim 11 wherein the 
resolution of the condition is determined to be unlikely to be 
predicted accurately if the branch instruction has not been 5 
fetched before. 

14. The method as set forth in claim 11 further including 
a step of storing information indicating whether the branch 
instruction was mispredicted a last time the branch instruc- 
tion was fetched and wherein the resolution oF the condition jo 
is unlikely to be predicted accurately if the resolution of the 
condition was mispredicted when a last time the branch 
instruction was fetched. 

15. '^I'he method as set forth in claim LI further including 

a step of storing a misprediction percentage indicating a J5 
number of times the resolution of the condition has been 
mispredicted out of a number of times the branch instruction 
has been fetched, and wherein the resolution of the condition 
is determined to be unlikely lo be predicted accurately when 
the misprediction percentage is higher than a predetermined 20 
percentage. 

16. The method as set forth in claim 12 wherein the steps 
of forking and speculatively processing are controlled by 
stream management logic, the stream management logic 
controlling concurrent processing of, a iirsi predetermined 2^ 
maximum number of instruction streams. 

17. The method as set forth in claim 16 further including 
a step of assigning a first instruction pointer of a second 
predetermined maximum number of instruction pointers to a 
first instruction stream including the lirst code section and 30 
second instruction pointer to a second instruction stream 
including the second code section, each of the instruction 
pointers indicating an address of an instruction being pro- 
cessed in the corresponding instruction stream. 

18. The method as set forth in claim 18 further including 35 
the following steps: 

resolving the condition; 

aborting processing of the second code section if the 

condition is resolved to be met; 
aborting processing of the first code section if the condi- 

tion is resolved to be not met; 
invalidating the instruction pointer associated with the 

aborted code section; 
completing processing of instructions in non-aborted code 

.sections; and 

committing results of instructions in non-aborted code 
sections to architectural state. 

19. The method as set forth in claim 18 further including, 
prior to the forking step, a step of determining whether 
sufficient processor resources are available for forking, 
sufficient processor resources being available if one instruc- 
tion pointer is available, and wherein the step of forking is 
performed if the conditional branch instruction is unlikely to 

be predicted accurately and sufficient processor resources (jj^ 
are available. 

20. The method as .set forth in claim 19 further including 
the following steps if sufficient processor resources are 
determined not to be available: 

speculatively processing the lust code section if the 
condition is predicted 10 be resolved to be met; and 

speculatively pi'ocessing the second code section if the 
condition is predicted 10 be resolved to be not met. 

21. A computer system comprising: 

a memory that stores instructions in a program flow, the 65 
program flow including a branch instruction having a 
condition to be resolved, a first code section to be 



processed if the condition is resolved to be met and a 
.second code .section to be processed if the condition is 
resolved to be not met;, 
a bus coupled to the memory that communicates infor- 
mation; and 
a microproces.sor including: 

a fetch unit that fetches instructions in the program How 

from the memory, 
branch prediction logic coupled to the fetch unit that 
predicts the resolution of the condition and that 
determines whether the resolution of the condition is 
unlikely to be predicted accurately, and 
stream management logic res[X)nsive to the branch 
prediction logic that directs speculative processing 
of instructions from both the first and second code 
sections prior to resotiition of the condition if the 
resolution of the condition is detemiined to be 
unlikely to be predicted accurately. . 

22. The computer system as set forth in claim 21 wherein 
the branch prediction logic includes a buffer that indicates a 
predicted resolution of a condition for each of a plurality of 
branch instructions and an address of each branch instruc- 
tion for which a predicted resolution is indicated, and 
w'herein the branch prediction logic determines that re.solu- 
tion of the condition is unlikely to be predicted accurately if 
the address of the branch instruction is not indicated in the 
bulTer at a time that the branch instruction is fetched. 

23. The computer system as set forth in claim 21 wherein 
the branch prediction logic further indicates whether reso- 
lution of the condition was mispredicted a last time the 
branch instruction was fetched, and w^herein the branch 
prediction logic determines that re.solution of the condition 
is unlikely to be predicted accurately when resolution of the 
condition was mispredicted the last time the branch instruc- 
tion was fetched. 

24. The computer system as set forth in claim 21 wherein 
the branch prediction logic further indicates a number of 
times the resolution of the condition has been mispredicted 
out of a number of times the branch instruction has been 
fetched, and Avherein -the -branch -prediction logic.determines. 
that resolution of the condition is unlikely to be predicted 
accurately if the resolution of the condition has been mispre- 
dicted a predetermined percentage of the number of times 
the branch instruction has been fetched. 

25. The computer system as set forth in claim 21 wherein 
the branch instruction is in a program and the branch 
prediction logic deteiTnines whether the resolution of the 
condition is unlikely to be predicted accurately based on a 
context of the branch iri-stmciion in the program. 

26. The computer system as set forth in claim 21 wherein 
the stream management logic tracks concurrent processing 
of a plurality of instruction streams up to a first predeter- 
mined maximum number of instruction streams, each 
instruction stream including a plurality of code sections. 

27. The computer system as set forth in claim 26 wherein 
the stream management logic is f*urther responsive to avail- 
ability of processor resources, the stream management logic 
directing speculative processing of in.struction.s from both 
the first and second code .sections prior to resolution of the 
condition if a number of iastruction streams being concur- 
rently processed is one le.ss than the first predetermined 
maximum number. 

28. The computer system as set forth in claim 21 wherein 
the stream management logic indicates a priority for specu- 
lative processing of each of the hrst and second code 
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sections, the priority of ihe first code section being higher 29. The computer system as set forth in claim 21 wherein 

than the priority of the second code section if the branch the stream management logic aborts speculative processing 

prediction logic predicts that the condition will be resolved of the second code section if the condition is resolved to be 

to be met, the priority of the first code section being lower met, and aborts speculative processing of the first code 

than the priority of the second code section if the branch 5 section if the condition is resolved to be not met. 
prediction logic predicts that the condition will be resolved 

to be not met. ***** 
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